perm filename PATREC[4,KMC]11 blob sn#104285 filedate 1974-05-28 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002
C00039 ENDMK
CāŠ—;


      PATTERN-MATCHING RULES FOR THE RECOGNITION OF
         NATURAL LANGUAGE DIALOGUE EXPRESSIONS

		Kenneth Mark Colby
                      and
	        Roger C. Parkison


	            INTRODUCTION

	To recognize something is to identify it as  an  instance  of
the "same again".   This familiarity is possible because of recurrent
characteristics of the world which repeat themselves  over  and  over
again.      We shall describe an algorithm which recognizes recurrent
characteristics of natural language dialogue expressions. It utilizes
a  multi-stage  sequence  of pattern-matching rules for progressively
transforming these input expressions into a pattern which  eventually
best  matches  a more abstract stored pattern. The name of the stored
pattern has a pointer to the name of a response  function  in  memory
which decides what to do once the input has been recognized.     Here
we shall discuss only  the  recognizing  functions,  except  for  one
response  function  (anaphoric substitution) which interactively aids
the recognition process.   Details  of  how  the  response  functions
operate  will  be  described  in  a  future  communication by William
Faught.
	In   constructing   and  testing  a  simulation  of  paranoid
processes, we were faced with the  problem  of  reproducing  paranoid
linguistic  behavior in a teletyped diagnostic psychiatric interview.
The diagnosis of paranoid states,  reactions  or  modes  is  made  by
clinicians  who  judge  a  degree of correspondence between what they
observe linguistically in an interview and their conceptual model  of
paranoid  behavior.  There  exists  a  high degree of agreement among
psychiatrists about this conceptual model which relies mainly on what
an interviewee says and how he says it.
	Natural language is a life-expressing  code  people  use  for
communication  with  themselves  and others.  In a real-life dialogue
such as a psychiatric interview,  the  participants  have  interests,
intentions,  and  expectations which are revealed in their linguistic
expressions.    To produce effects on an interviewer which  he  would
judge  similar  to  the  effects  produced by a paranoid patient , an
interactive  simulation  of  a  paranoid  patient  must  be  able  to
demonstrate  typical  paranoid  interview  behavior.  To  achieve the
desired effects, the paranoid model must have  the  ability  to  deal
with the teletyped linguistic behavior of the interviewer.
	There  are  a  number of approaches one might consider for an
ideal handling of dialogue expressions.       One approach  would  be
to  construct  a  dictionary  of all expressions which could possibly
arise in an interview.  Associated with each expression would be  its
interpretation,  depending  on  dialogue  context, which in turn must
somehow be defined. For obvious reasons, no one takes  this  approach
seriously.       Instead  of  an  expression  dictionary,  one  might
construct a word dictionary and then use projection rules to yield an
interpretation  of a sentence from the dictionary definitions.   This
has been the approach adopted by most  linguistic  parsers  based  on
transformational or systemic  grammars.     Such  a  method  performs
adequately  as  long  as  the  dictionary involves only a few hundred
words, each word having only one or two senses, and the  dialogue  is
limited  to  a mini-world of a few objects and relations.     But the
problems which arise in a real-time psychiatric  interview  conducted
in unrestricted English are too great for this method to be useful in
actual dialogues requiring immediate responses.
	There  is little consensus knowledge about how humans process
natural language.    They can be shown to possess some  knowledge  of
grammar  rules  but  this  does not entail that they use a grammar in
interpreting and producing language. It seems implausible to us  that
people   possess   full   transformational  grammars  for  processing
language.     Language  is  what  is  recognized  but  the  processes
involved   may  not  be  linguistic  or  grammatical.      Originally
transformational grammars were not designed to "understand"  a  large
subset  of  English;  they  represented  a formal method for deciding
whether a string is grammatical.
	An analysis of what one's problem actually  is  should  guide
the  selection  or invention of methods appropriate to that problem's
solution.  Our problem was not to develop a  consistent  and  general
theory  of  language  nor  to  assert empirically testable hypotheses
about how people process language.   Our problem  was  to  design  an
algorithm  which recognizes what is being said in a dialogue and what
is being said about it in order to make a response such that a sample
of I-O pairs from the paranoid model is judged similar to a sample of
I-O pairs from paranoid  patients.    The  design  task  was  one  of
artificial  intelligence  in  which  attempts  are  made  to  program
computers to perform human mind-like functions.  New methods  had  to
be  devised  for an algorithm to participate in a human dialogue in a
paranoid-patient-like way.  We are  not  claiming  that  our  methods
represent  the way people process language.       We sought effective
methods which could operate efficiently  in  real  time.   Since  our
methods  provide  a  general  way of many-to-one mapping from surface
expressions to a single stored pattern, these methods can be used  by
any type of "host" system which takes natural language as input.
	To perform  the  task  of  managing  communicative  uses  and
effects  of natural language, we developed a strategy of transforming
the input until a pattern is achieved  which  matches  completely  or
partially  a  more abstract stored pattern.  This strategy has proved
adequate for our purposes a satisfactory percentage of the time.  (No
one  expects  any  method to be successful 100% of the time since not
even humans, the best natural  language  processors  around,  achieve
this  level  of  performance).   The power of this method for natural
language dialogues lies in its ability to ignore as  irrelevant  some
of what it recognizes and everything it does not recognize at all.  A
conventional  parser  doing  word-by-word,  parts-of-speech  analysis
fails  when  it  cannot  find  one  or more of the input words in its
dictionary. Because it must know every word, it is  too  fragile  for
unrestricted dialogues.
	In  early versions of the paranoid model, (PARRY1), many (but
not all) of the pattern recognition mechanisms allowed  the  elements
of  the  pattern  to  be  order independent. (Colby, Weber, and Hilf,
1971).  For example, consider the following expressions:
	(1) WHERE DO YOU WORK?
	(2) WHAT SORT OF WORK DO YOU DO? 
	(3) WHAT IS YOUR OCCUPATION? 
	(4) WHAT DO YOU DO FOR A LIVING? 
	(5) WHERE ARE YOU EMPLOYED? 
In PARRY1 a procedure would scan these  expressions  looking  for  an
information-bearing  contentive  such as "work", "for a living", etc.
If it found such a contentive along with a "you"  or  "your"  in  the
expression,  regardless  of  word  order,  it  would  respond  to the
expression as if it were a question about the nature of  one's  work.
This   method   correctly   classifies   the   5   sentences   above.
Unfortunately, it includes the 2 examples below in the same category:
	(6) DOES YOUR FATHER'S CAR WORK?
	(7) HOW DID THINGS WORK OUT FOR YOU?
An  insensitivity  to word order has the advantage that lexical items
representing  different  parts  of  speech  can  represent  the  same
concept,e.g.  "work" as noun or as verb. But a price is paid for this
resilience and elasticity.  We  found  from  experience  that,  since
English  relies  heavily  on  word  order to convey the meaning of it
messages,  the   average   penalty   of   misunderstanding   (to   be
distinguished from ununderdstanding), was too great. Hence in PARRY2,
as will be described shortly, the patterns require a  specified  word
order.
	For   high-complexity   problems  it  is  necessary  to  have
constraints.       Diagnostic psychiatric interviews (and  especially
those  conducted  over  teletypes)  have several natural constraints.
First, clinicians are trained to ask  certain  questions  in  certain
ways.    This  limits  the  number  of patterns required to recognize
utterances about each topic. Second,  only  a  few  hundred  standard
topics are brought up by interviewers who are trained to use everyday
expressions and especially those used by the  patient  himself.  When
the  interview  is  conducted  over teletypes, expressions tend to be
shortened since the interviewer tries  to  increase  the  information
transmission  rate  over  the slow channel of a teletype.    Finally,
teletyped interviews represent  written  utterances.  Utterances  are
known  to  be  highly redundant and unrecognized words can be ignored
without losing the meaning of the message. Utterances are loaded with
idioms,  cliches, pat phrases, etc.       - all being easy prey for a
pattern-matching approach.   It is time-wasting and usually futile to
try  to  decode  an idiom by analyzing the meanings of its individual
words.
	We  shall  now describe the pattern-matching functions of the
algorithm in some detail. (See Fig. 1 for a diagram  of  the  overall
flow of control).


		    OVERVIEW

	PARRY2 has  two  primary  modules.   The  first  attempts  to
RECOGNIZE the input and the second RESPONDS.  This paper is primarily
about the  RECOGNIZE  module.   It  functions  independently  of  the
RESPOND  module  except  in the case of pronoun references, which the
RESPOND module provides to the RECOGNIZER on request.
	The recognition module has 4 main steps:
	1) Identify the words in the question and convert them to
		internal synonyms.
	2) Break the input into segments at certain bracketing words.
	3) Match each segment (independently) to a stored pattern.
	4) Match the resulting list of recognized segments to a stored
		  compound pattern.
Each  of  these  steps,  except  the  segmenting, throws away what it
cannot identify.  When an utterance refers to a  topic  which  PARRY2
has some ability to recognize, this increases the chance that it will
ultimately be recognized. Occasionally, a  reference  to  an  unknown
topic is mis-recognized as some familiar topic.

 		    PREPROCESSING

	Each word in the input expression is first  looked  up  in  a
dictionary  of(currently)  about  1900 entries which, for the sake of
speed, is maintained in core during run-time. (The complete  language
recognition  process  requires less than one second of real time on a
time-shared DEC PDP-10). The dictionary, which was built  empirically
from  thousands of teletyped interviews with previous versions of the
model, consists of words, groups of words, and names of  word-classes
they  can  be translated into. (See Fig.  2 for illustrative examples
of dictionary entries).  The size of the dictionary is determined  by
the patterns, i.e. a word is not included unless it appears in one or
more patterns.  Entries  in  the  dictionary  reflect  PARRY2's  main
interests.  If  a  word  in the input is not in the dictionary, it is
dropped from the pattern being formed. Thus if the input were:
	WHAT IS YOUR CURRENT OCCUPATION?
and  the  word  "current"  were not in the dictionary, the pattern at
this stage becomes:
	( WHAT IS YOUR OCCUPATION )   
The question-mark is thrown away as  redundant  since  questions  are
recognized  by  word  order. (A statement followed by a question mark
(YOU GAMBLE?) is responded to in  the  same  way  as  that  statement
followed  by  a  period.) Synonymic translations of words are made so
that the pattern becomes, for example:
	( WHAT  BE  YOU  JOB )   
	Some groups of words (i.e. idioms) are translated as a  group
so that, for example, "for a living" becomes "for job". Certain other
juxtaposed words are contracted into a single word,  e.g.  "place  of
birth"  becomes  "birthplace".  This  is  done to deal with groups of
words which are  represented  as  a  single  element  in  the  stored
pattern,  thereby preventing segmentation from occurring at the wrong
places, such as at a preposition inside an idiom or  phrase.  Besides
these  contractions, certain expansions are made so that for example,
"DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
	Misspellings  can  be the bane of teletyped interviews for an
algorithm.  Here  they  are  handled  in  two  ways.  First,   common
misspellings  of  important  words  are simply put in the dictionary.
Thus "yoy" is known to mean "you". The apostrophe  is  often  omitted
from contractions so most contractions are recognized with or without
it. These common misspellings were gathered from over 4000 interviews
with earlier versions of the paranoid model.
	Second,  there  are  5 common forms of typing error which are
checked systematically.  These are:
	1) Doubled letter
	2) Extraneous letter
	3) Forgetting to hold the "shift key" for an apostrophe
	4) Hitting a nearby key on the keyboard
	5) Transposing two letters in a word
The first three errors can be corrected  by  deleting  the  offending
character  from  the  word.   This  is  accomplished by deleting each
character in turn until the  word  becomes  a  recognized  word.  The
fourth  type  of error is only checked for 10 of the more common near
misses.    These were also empirically determined and include  letter
pairs  like  (T  Y), (O P), (A S), and (N M).   These methods are all
based on typing errors, but they also correct some legitimate English
spelling  errors.   Two-letter  transposition  corrects, for example,
"beleive" to "believe".

                      SEGMENTING

	Another  weakness in the crude pattern matching of PARRY1 was
that it took the entire input  expression  as  its  basic  processing
unit.  Hence  if  only  two  words  were  recognized in an eight word
utterance, the risk of misunderstanding was great. We needed a way of
dealing with units shorter than the entire input expression.
	Expert  telegraphists  stay  six  to  twelve  words  behind a
received message before transcribing it.  Translators wait until they
have  heard  4-6  words  before  they  begin translating.  Aided by a
heuristic from work in machine-translation (Wilks, 1973 ), we devised
a  way  of  bracketing  the pattern constructed up to this point into
shorter segments using prepositions, wh-forms, certain verbs, etc. as
bracketing points.  (A list of the bracketing terms appears  in  Fig.
3).   These  tend  to  separate  prepositional  phrases  and embedded
clauses from the main clause.   The  new  pattern  formed  is  termed
either "simple", having no delimiters within it, or "compound", i.e.,
being made up of two or more simple patterns. A simple pattern  might
be:
	( WHAT BE YOU JOB )
whereas a compound pattern would be:
	(( WHY BE YOU ) ( IN HOSPITAL ))
Our experience with this method of segmentation shows  that  compound
patterns  from teletyped psychiatric dialogues rarely consist of more
than three or four segments. 
	After certain verbs (See  Fig.  4)  a  bracketing  occurs  to
replace the commonly omitted "THAT", such that:
	( I THINK YOU BE AFRAID )
becomes
	(( I THINK ) ( YOU BE AFRAID ))

		   MATCHING INDIVIDUAL SEGMENTS

	Conjunctions serve only as markers for the segmenter and they
are dropped out after segmentation.
	Negations are  handled  by  extracting  the  "NOT"  from  the
segment  and  assigning  a value to a global variable which indicates
that the expression is negative in form. When a  pattern  is  finally
matched,  this variable is consulted. Some patterns have a pointer to
a pattern  of  opposite  meaning  if  a  "NOT"  could  reverse  their
meanings.      If this pointer is present and a "NOT" was found, then
the pattern matched is replaced by its opposite, e.g.  ( I not  trust
you  ) is replaced by the pattern ( I mistrust you ). We have not yet
observed the troublesome  case  of  "he  gave  me  not  one  but  two
messages". (There is no need to scratch where it doesn't itch).
	Substitutions are also made in certain cases.  Some  segments
contain  pronouns  which could stand for a number of different things
of importance to PARRY2.       As we mentioned in  the  introduction,
there  is  one  case in which the response functions of memory aid in
language recognition.   One function of memory is to  keep  track  of
the  context  in order to give pronouns and other anaphoras a correct
interpretation.  For example, the segment:
	( DO YOU AVOID THEM )
could refer to the Mafia, or racetracks, or other patients, depending
on the context.  When such a segment is encountered,  the  pronoun  is
replaced by its current anaphoric value as determined by the response
functions, and a more specific segment such as:
	( DO YOU AVOID MAFIA )
is  looked  up.
	There are other utterances, such as "Why did you do that?" or
just  "Why?"  (which  might be regarded as a massive ellipsis), which
clearly refer back to previous  utterances.  These  utterances  match
very  general  patterns  which  identify the type of question without
indicating the exact topic. The response function which  responds  to
"Why?" consults the context to produce an appropriate answer.
	The algorithm now attempts to match the segments with  stored
simple  patterns  which currently number about 1700. First a complete
and perfect match is sought.   When a  match  is  found,  the  stored
pattern  name  has  a  pointer  to the name of a response function in
memory which decides what to do further.  If a match  is  not  found,
further  transformations of the segment are carried out and a "fuzzy"
match is tried.
	For  fuzzy  matching  at this stage, we adopted the heuristic
rule of dropping elements in the segment one at a time and attempting
a match each time.   This heuristic allows ignoring familiar words in
unfamiliar contexts.   For example, "well" is important in  "Are  you
well?" but meaningless in "Well are you?".
 	Deleting one element at a time results  in,  for example, the
pattern:
		( WHAT BE YOU MAIN PROBLEM )
becoming successively:
		(a) ( BE YOU MAIN PROBLEM )
		(b) ( WHAT YOU MAIN PROBLEM )
		(c) ( WHAT BE MAIN PROBLEM )
		(d) ( WHAT BE YOU PROBLEM )
		(e) ( WHAT BE YOU MAIN )
Since the stored pattern in this case matches (d), (e) would  not  be
constructed.   We  found  it  unwise  to delete more than one element
since our segmentation method usually yields  segments  containing  a
small (1-4) number of words.
	Dropping  an  element  at  a  time  provides  a   probability
threshold  for  fuzzy   matching which is a function of the length of
the segment. If a segment consists of five elements, four of the five
must be present in a particular order (with the fifth element missing
in any position) for a match to occur. If  a  segment  contains  four
elements, three must match - and so forth.

		   COMPOUND-PATTERN MATCH

	When more than one simple pattern is detected in the input, a
second  matching  is  attempted  against about 500 compound patterns.
Certain patterns, such as ( HELLO ) and (  I  THINK  ),  are  dropped
because  they  are considered meaningless. If a complete match is not
found, then simple patterns are dropped, one  at  a  time,  from  the
complex pattern. This allows the input,
	(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
to  match  the  stored  pattern,
	(( HOW  DO  YOU COME ) ( IN HOSPITAL )).

	If  no  match  can  be found at this point, the algorithm has
arrived at a default condition and the appropriate response functions
decide  what  to  do.  For example, in a default condition, the model
may assume  control  of  the  interview,  asking  the  interviewer  a
question, continuing with the topic under discussion or introducing a
new topic.
	An annotated example of a diagnostic psychiatric interview is
presented in the appendix.


		   ADVANTAGES AND LIMITATIONS

	As   mentioned,   one   of   the   main   advantages   of   a
pattern-matching strategy is that it can ignore  as  irrelevant  both
some  of  what  it  recognizes and what it does not recognize at all.
There are several million words in English, each  possessing  one  to
over   a   hundred  senses.    To  construct  a  machine-usable  word
dictionary of this magnitude is out of the  question  at  this  time.
Recognition of natural language input such as described above, allows
real-time  interaction  in  a  dialogue  since  it  avoids   becoming
ensnarled   in  combinatorial  disambiguations  and  long  chains  of
inferencing  which  would  slow  a   dialogue   algorithm   down   to
impracticality,  if it could even function at all. The price paid for
pattern-matching is that  sometimes,  but  rarely,  ambiguities  slip
through.   Eventually a rewrite interpreter must be added to pattern-
matching rules to deal with ambiguities. (Enea and Colby,1973).
	A drawback to PARRY1 was that it reacted to the first pattern
it  found  in the input rather than characterizing the input as fully
as possible and then deciding what to do based on a number of  tests.
Another   practical   difficulty  with  PARRY1  from  a  programmer's
viewpoint, was that elements of  the  patterns  were  strung  out  in
various  procedures  throughout  the  algorithm.    It  was  often  a
considerable chore for the programmer to determine  whether  a  given
pattern   was   present   and   precisely   where   it  was.  In  the
above-described method, the patterns are all collected in one part of
the data-base where they can easily be examined.
	Concentrating all the patterns in the data base gives  PARRY2
a  limited  "learning"  ability.   When  an  input fails to match any
stored pattern or matches an incorrect one,  as  judged  by  a  human
operator,  a  pattern  which  matches  the  input can be put into the
data-base automatically.  If the new pattern has the same meaning  as
a previously stored pattern, the human operator must provide the name
of the appropriate response function.  If  he  doesn't  remember  the
name,  he  may  try  to  rephrase the input in a form recognizable to
PARRY2 and it will name the response  function  associated  with  the
rephrasing.  These mechanisms are not "learning" in the commonly used
sense but they do allow a  person  to  transfer  his  knowledge  into
PARRY2's data-base with very little  effort.
	Informal  observation  thus  far  shows  PARRY2's  linguistic
recognition  abilities  to  be  quite  superior  to  PARRY1's. A more
systematic and quantitative evaluation of performance will be carried
out in the near future.


                      REFERENCES

Colby, K.M., Weber, S., and Hilf, F.D. (1971). Artificial Paranoia.
	ARTIFICIAL INTELLIGENCE, 2, 1-25.

Enea, H. and Colby, K.M. (1973). Idiolectic Language-analysis
       For Understanding Doctor-patient Dialogues. THIRD
       INTERNATIONAL JOINT CONFERENCE IN ARTIFICIAL INTELLIGENCE.
       278-284.

Wilks, Y. (1973). An artificial intelligence approach to machine
	translation. In COMPUTER MODELS OF THOUGHT AND 
	LANGUAGE, Schank, R. and Colby, K.M., Eds., W.H. Freeman,
	San Francisco.